28 research outputs found

    Deep Learning: Exemplar Studies in Natural Language Processing and Computer Vision

    Get PDF
    Deep learning has become the most popular approach in machine learning in recent years. The reason lies in considerably high accuracies obtained by deep learning methods in many tasks especially with textual and visual data. In fact, natural language processing (NLP) and computer vision are the two research areas that deep learning has demonstrated its impact at utmost level. This chapter will firstly summarize the historical evolution of deep neural networks and their fundamental working principles. After briefly introducing the natural language processing and computer vision research areas, it will explain how exactly deep learning is used to solve the problems in these two areas. Several examples regarding the common tasks of these research areas and some discussion are also provided

    DLT-Like Calibration of Central Catadioptric Cameras

    Get PDF
    International audienceIn this study, we present a calibration technique that is valid for all single-viewpoint catadioptric cameras. We are able to represent the projection of 3D points on a catadioptric image linearly with a 6 × 10 projection matrix, which uses lifted coordinates for image and 3D points. This projection matrix can be computed with enough number of 3D-2D correspondences (minimum 20 points distributed in three different planes). We show how to decompose it to obtain intrinsic and extrinsic parameters. Moreover, we use this parameter estimation followed by a non-linear optimization to calibrate various types of cameras. Our results are based on the sphere camera model which considers that every central catadioptric system can be modeled using two projections, one from 3D points to a unitary sphere and then a perspective projection from the sphere to the image plane. We tested our method both with simulations and real images

    User Behaviour in Web-based Interactive Virtual Tours

    No full text

    User behaviour in web-based interactive virtual tours

    No full text
    In this work, user behaviour characteristics were investigated for a webbased virtual tour application in which 3600 panoramic images were used. There exist several options for the user to navigate in the museum (interactive floor plan, links in the images and pull-down menu). Written and audio information about the sections visited, detailed information for some artworks and several control functions are provided at the webpage. 15 participants undertook the usability test and they filled a post-experiment questionnaire. The main research questions were: Which option of navigation is preferred? At what rate are the written information area, audio information option and extra artwork information used? Which way of control (mouse, keyboard, panel buttons) is preferred? Results showed that the floor plan is the most preferred way for changing the location and pull-down menu is the least preferred. Another finding is that the mouse is the most preferred way for control functions

    Reduced egomotion estimation drift using omnidirectional views

    No full text
    Estimation of camera motion from a given image sequence is a common task for multi-view 3D computer vision applications. Salient features (lines, corners etc.) in the images are used to estimate the motion of the camera, also called egomotion. This estimation suffers from an error built-up as the length of the image sequence increases and this causes a drift in the estimated position. In this letter, this phenomenon is demonstrated and an approach to improve the estimation accuracy is proposed. The main idea of the proposed method is using an omnidirectional camera (360° horizontal field of view) in addition to a conventional (perspective) camera. Taking advantage of the correspondences between the omnidirectional and perspective images, the accuracy of camera position estimates can be improved. In our work, we adopt the sequential structure-from-motion approach which starts with estimating the motion between first two views and more views are added one by one. We automatically match points between omnidirectional and perspective views. Point correspondences are used for the estimation of epipolar geometry, followed by the reconstruction of 3D points with iterative linear triangulation. In addition, we calibrate our cameras using sphere camera model which covers both omnidirectional and perspective cameras. This enables us to treat the cameras in the same way at any step of structure-from-motion. We performed simulated and real image experiments to compare the estimation accuracy when only perspective views are used and when an omnidirectional view is added. Results show that the proposed idea of adding omnidirectional views reduces the drift in egomotion estimation

    Reduced egomotion estimation drift using omnidirectional views

    Get PDF
    Estimation of camera motion from a given image sequence becomes degraded as the length of the sequence increases. In this letter, this phenomenon is demonstrated and an approach to increase the estimation accuracy is proposed. The proposed method uses an omnidirectional camera in addition to the perspective one and takes advantage of its enlarged view by exploiting the correspondences between the omnidirectional and perspective images. Simulated and real image experiments show that the proposed approach improves the estimation accuracy.Comment: Another publisher does not want this article to be shared at arxiv.org in order to publish i

    Self-Supervised Contrastive Representation Learning in Computer Vision

    No full text
    Although its origins date a few decades back, contrastive learning has recently gained popularity due to its achievements in self-supervised learning, especially in computer vision. Supervised learning usually requires a decent amount of labeled data, which is not easy to obtain for many applications. With self-supervised learning, we can use inexpensive unlabeled data and achieve a training on a pretext task. Such a training helps us to learn powerful representations. In most cases, for a downstream task, self-supervised training is fine-tuned with the available amount of labeled data. In this study, we review common pretext and downstream tasks in computer vision and we present the latest self-supervised contrastive learning techniques, which are implemented as Siamese neural networks. Lastly, we present a case study where self-supervised contrastive learning was applied to learn representations of semantic masks of images. Performance was evaluated on an image retrieval task and results reveal that, in accordance with the findings in the literature, fine-tuning the self-supervised training showed the best performance

    Corner validation based on extracted corner properties

    No full text
    We developed a method to validate and filter a large set Of previously obtained corner points. We derived the necessary relationships between image derivatives and estimates of corner angle, orientation and contrast. Commonly Used cornerness measures of the auto-correlation matrix estimates of image derivatives are expressed in terms of these estimated corner properties. A candidate corner is validated if the cornerness score directly obtained from the image is sufficiently close to the cornerness score for ail ideal corner with the estimated orientation, angle and contrast. We tested this algorithm oil both real and synthetic images and observed that this procedure significantly improves the corner detection rates based oil human evaluations. We tested the accuracy Of our corner property estimates under various noise conditions. Extracted corner properties call also be used for tasks like feature point matching, object recognition and pose estimation. (c) 2008 Elsevier Inc. All rights reserved

    Joint optimisation for object class segmentation and dense stereo reconstruction

    Get PDF
    This work is supported by EPSRC research grants, HMGCC, TUBITAK researcher exchange grant, the IST Programme of the European Community, under the PASCAL2 Network of Excellence, IST-2007-216886.The problems of dense stereo reconstruction and object class segmentation can both be formulated as Conditional Random Field based labelling problems, in which every pixel in the image is assigned a label corresponding to either its disparity, or an object class such as road or building. While these two problems are mutually informative, no attempt has been made to jointly optimise their labellings. In this work we provide a principled energy minimisation framework that unifies the two problems and demonstrate that, by resolving ambiguities in real world data, joint optimisation of the two problems substantially improves performance. To evaluate our method, we augment the street view Leuven data set, producing 70 hand labelled object class and disparity maps. We hope that the release of these annotations will stimulate further work in the challenging domain of street-view analysis
    corecore